The Effect of Text Difficulty on Machine Translation Performance -- A Pilot Study with ILR-Rated Texts in Spanish, Farsi, Arabic, Russian and Korean

نویسندگان

  • Ray Clifford
  • Neil Granoien
  • Douglas A. Jones
  • Wade Shen
  • Clifford J. Weinstein
چکیده

We report on initial experiments that examine the relationship between automated measures of machine translation performance (Doddington, 2003, and Papineni et al. 2001) and the Interagency Language Roundtable (ILR) scale of language proficiency/difficulty that has been in standard use for U.S. government language training and assessment for the past several decades (Child, Clifford and Lowe 1993). The main question we ask is how technology-oriented measures of MT performance relate to the ILR difficulty levels, where we understand that a linguist with ILR proficiency level N is expected to be able to understand a document rated at level N, but to have increasing difficulty with documents at higher levels. In this paper, we find that some key aspects of MT performance track with ILR difficulty levels, primarily for MT output whose quality is good enough to be readable by human readers. • This work was sponsored by the Defense Language Institute under Air Force Contract number F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government. Introduction Current automated MT scoring techniques do not specifically consider the difficulty of input text in evaluating performance. We analyze the performance of MT with respect to input text difficulty and scoring methods. We focus our study on the behavior of the official NIST MT Evaluation scoring package based on the IBM BLEU scoring tool. We introduce a corpus of rated texts selected from five different languages with accompanying reference translations. Using the reference translations in this corpus, we conducted a variety of experiments that examine the difficultyperformance relationship. Some of the experiments address properties of the texts that may affect MT components (i.e., more difficult text may be more difficult to parse), whereas other experiments address MT performance in terms of NIST/BLEU scores. “SPARK” Microcorpus Rated for ILR Difficulty Language instructors for Spanish, Farsi (Persian), Arabic, Russian, and Korean, at the U.S. Defense Language Institute (DLI) have selected and rated a small collection of documents at each of seven difficulty levels, across a range of topical domains, for the purpose of exploring the relationship between MT performance and input text difficulty. The Linguistic Data Consortium has agreed to make an online version of this corpus available to the MT research community. Each text is accompanied by at least four English reference translations and a commentary on the difficulty level. Overview of the ILR Difficulty Levels The ILR skill levels are an integral part of foreign language skill assessment in a variety of settings for agencies in the U.S. Government. A description of an ILR-based text classification scheme can be found in (Child et al., 1993 and Lowe 1999) and on the web (see References); some key points: • Level 1 texts: contain short, discrete, simple sentences; generally pertain to the immediate time frame; often written in an orientational mode; require elementary level reading skill. Example: Newspaper announcements. • Level 2 texts: convey facts with the purpose of exchanging information; do not editorialize on the facts; often written in an instructive mode; require limited working proficiency. Example: Newswire articles; TIDES/MT evaluation data. • Level 3 texts: have denser syntax and highly analytic expressions; place greater conceptual demands on the reader; often written in an evaluative mode; may require the reader to ‘read between the lines’; require general professional proficiency. Example: newspaper opinion / editorial articles. • Level 4 texts: express creative thinking; assume a relative lack of shared personal information; often involve a highly individualized mode that projects the style of the author; require advanced professional proficiency. Example: essays; political editorials that reformulate social, economic or political policy. Figure 1 shows a sampling of Spanish, Farsi, Arabic, Russian and Korean text segments in the SPARK corpus. To save space, only one example of each text difficulty level is shown for each of the seven levels in our corpus [1, 1+, 2, 2+, 3, 3+, 4]. Some basic statistics about the corpus are shown in Figure 2. Arabic – Level 1 (Car Sale Advertisement) Src مطلوب شراء سيارة Ref A car needed for purchase MT required buying a car

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-lingual Text Leveling

Determining the language proficiency level required to understand a given text is a key requirement in vetting documents for use in second language learning. In this work, we describe our approach for developing an automatic text analytic to estimate the text difficulty level using the Interagency Language Roundtable (ILR) proficiency scale. The approach we take is to use machine translation to...

متن کامل

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...

متن کامل

SMT at the International Maritime Organization: experiences with combining in-house corpora with out-of-domain corpora

This paper presents a machine translation tool – based on Moses – developed for the International Maritime Organization (IMO) for the automatic translation of documents from Spanish, French, Russian and Arabic to/from English. The main challenge lies in the insufficient size of inhouse corpora (especially for Russian and Arabic). The United Nations (UN) granted IMO the right to use UN resources...

متن کامل

Translation Evaluation in Educational Settings for Training Purposes

The following article describes different methods and techniques used in educational settings for translation evaluation. Translation evaluation is the placing of value on a translation i.e. awarding a mark, even if only a binary pass/fail one. In the present study, different features of the texts chosen for evaluation were firstly considered and then scoring the t...

متن کامل

ILR-Based MT Comprehension Test with Multi-Level Questions

We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the MT condition, for both text and audio ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004